Placement Service
This document provides comprehensive technical documentation for the PlacementService, which powers LLM-enabled extraction of placement offers from email sources. It explains the placement offer detection algorithms, structured data extraction workflows, and integration with Google Gemini for intelligent content processing. The document covers the placement offer schema, email processing pipeline tailored for placement notifications, LLM prompt engineering, examples of processing different offer types, error handling strategies, and integration with the notification system for delivering updates to users.
The placement system is composed of modular services that work together to fetch, process, validate, and publish placement offers:
Email ingestion via Google Groups IMAP client
LangGraph-based processing pipeline with classification, extraction, validation, and privacy sanitization
Structured data models for placement offers and notifications
Database persistence and retrieval
Notification routing to Telegram and other channels
Diagram sources
Section sources
PlacementService: Orchestrates the end-to-end pipeline using LangGraph, integrating IMAP email fetching, LLM-based extraction, validation, and privacy sanitization.
GoogleGroupsClient: Provides IMAP connectivity and email parsing for placement group feeds.
DatabaseService: Manages MongoDB collections for notices, jobs, placement offers, and user data.
PlacementNotificationFormatter: Transforms placement events into notification-ready notices.
NotificationService: Routes formatted notices to Telegram and other channels.
EmailNoticeService: Separate pipeline for non-placement notices (contrast to placement offers).
Key data models:
PlacementOffer, RolePackage, Student: Structured representation of placement offers.
NewOfferEvent, UpdateOfferEvent: Event structures for new and updated offers.
NoticeDocument: Standardized notice format for publishing.
Section sources
The PlacementService implements a four-stage LangGraph pipeline:
Classification: Keyword-based relevance scoring to filter placement-related emails.
Extraction: LLM-based structured extraction with robust retry logic.
Validation: Schema validation and enhancement (role/student mapping, package normalization).
Privacy Sanitization: Removal of headers, forwarded markers, and sensitive metadata.
Diagram sources
PlacementService#
The PlacementService encapsulates the LangGraph pipeline and integrates with external systems:
Initialization: Configures LLM (Gemini), builds the workflow graph, and prepares dependencies.
Classification: Computes a confidence score using placement-related keywords, company indicators, presence of names/emails, and negative indicators.
Extraction: Invokes the LLM with a strict prompt template to validate final placement offers and extract structured data; includes retry logic on validation errors.
Validation: Ensures required fields (company, roles, students), normalizes package values, and assigns default roles/packages when possible.
Privacy Sanitization: Removes headers, forwarded markers, and sensitive metadata from extracted fields.
Persistence: Uses DatabaseService to upsert offers and emit events for notification formatting.
Diagram sources
Section sources
Email Processing Pipeline#
The pipeline is designed specifically for placement notifications:
Email fetching: IMAP-based retrieval of unread emails from the placement group.
Content parsing: Decodes subject, sender, and body; extracts forwarded sender/date metadata.
Classification: Keyword scoring and negative indicators to determine relevance.
Extraction: Strict LLM prompt ensures only final placement offers are processed.
Validation: Schema enforcement and data normalization.
Privacy: Sanitization of headers and forwarded metadata.
Storage: Upsert into placement offers collection with event emission for notifications.
Diagram sources
Section sources
Placement Offer Schema#
The system defines a comprehensive schema for placement offers:
Company: Name of the organization.
Roles: List of roles with associated packages and package details.
Job Location: Optional locations.
Joining Date: ISO date string.
Students Selected: List of students with names, enrollment numbers, emails, roles, and packages.
Number of Offers: Count of selected students.
Additional Info: Supplementary details.
Email Metadata: Subject, sender, and time sent.
Diagram sources
Section sources
LLM Prompt Engineering#
The LLM prompt enforces strict validation before extraction:
Phase 1: Classification of final placement offers with explicit criteria (package presence, finality, placement status, training/PPO handling).
Phase 2: Extraction into a strict schema with privacy rules forbidding headers and forwarded metadata.
Package normalization rules: LPA conversion, range handling, stipend vs. CTC distinctions, and conditional offers.
Retry logic: Validates JSON and Pydantic models, with up to three attempts.
Diagram sources
Section sources
Examples of Processing Different Types of Placement Offers#
The system handles diverse offer types:
Full-time offers with CTC and benefits.
Internship-only offers with stipend conversion to LPA.
Conditional offers (PPOs) with final CTC.
Training programs leading to FTE with package disclosure.
Multiple roles with different packages.
Examples are visible in the stored placement offers dataset.
Section sources
Integration with Notification System#
Placement events trigger formatted notices:
New Offer: Creates a notice summarizing total placements and role breakdowns.
Update Offer: Adds newly placed students and updates totals.
NotificationService broadcasts notices to Telegram and other channels.
DatabaseService persists notices and tracks sent status.
Diagram sources
Section sources
The PlacementService depends on:
GoogleGroupsClient for email ingestion.
LangChain ChatGoogleGenerativeAI for LLM processing.
Pydantic models for validation.
DatabaseService for persistence and event emission.
NotificationService and PlacementNotificationFormatter for publishing.
Diagram sources
Section sources
Retry Strategy: Up to three retries for extraction/validation failures to mitigate transient LLM issues.
Lazy IMAP Usage: Fetches email IDs first, then processes emails individually to reduce memory overhead.
Package Normalization: Converts all amounts to LPA and handles ranges conservatively (lowest quantifiable value).
Forwarded Metadata Extraction: Robust parsing of forwarded dates and senders to maintain accurate timestamps and attribution.
Database Upserts: Efficient merge logic for offers to minimize duplication and maximize event generation for updates.
[No sources needed since this section provides general guidance]
Common issues and resolutions:
Empty LLM Responses: Treated as non-placement offers; review prompt clarity and content sanitization.
Validation Errors: Inspect missing company names, roles, or student lists; ensure single-role defaults are applied.
JSON Parsing Failures: Verify LLM output formatting; ensure raw JSON is returned without markdown wrappers.
IMAP Connectivity: Confirm environment variables for email and app password; verify server configuration.
Privacy Sanitization: Ensure headers and forwarded markers are removed; confirm additional_info and package details are cleaned.
Notification Delivery: Check unsent notices and channel configurations; verify database marking as sent.
Section sources
The PlacementService provides a robust, LLM-powered pipeline for extracting and publishing placement offers from email sources. Its strict classification and extraction prompts, combined with validation and privacy sanitization, ensure high-quality, normalized data. The integration with the notification system enables timely delivery of placement updates to users, while the database layer supports historical tracking and statistical reporting.